-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/htlc flush shutdown #8167
Bugfix/htlc flush shutdown #8167
Conversation
0b12f1d
to
a82f325
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR! MVar
certainly seems very powerful, but I have to confess that I still need to study the implementation more closely to make sure there aren't any obvious pitfalls. In the past, we've realized that sync.Cond
can take a bit of care to ensure it's used properly.
Left some comments in line re another area we need to cancel back (mailbox.FailAdd
when handling down stream requests). Also I think we need to add a new continuation flow from the link to the chan closer. After shutdown is sent+recv'd, and only after the flush has fully completed should the state machine in the ChanCloser
be allowed to progress. I think we can do this by adding the continuation logic to the closure we send as a callback to Flush
.
fn/mvar.go
Outdated
"sync/atomic" | ||
) | ||
|
||
// MVar[A any] is a structure that is designed to store a single value in an API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think could use some more detail in the commit body re the rationale/usage for such a variable. Eg: does the M
prefix have any intended semantic meaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you rather have it in the commit body or just in the godoc itself? I believe the "M" is for "Mutable". The API comes from here. Probably silly in this context since everything is mutable in go. I'm not attached to the name.
fn/mvar.go
Outdated
// a non-nil out it means we can actually finish the Take. | ||
m.mutex.Lock() | ||
for val = m.value.Swap(nil); val == nil; val = m.value.Swap(nil) { | ||
m.tCond.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to thread through a context.Context
here to allow any blocked callers to automatically exit if a shutdown is initiated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea. We can make it optional and if they don't provide one then we just block til the heat death of the universe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon second look, even if we accept a context here it raises the question of whether we want to have an error value that can be returned from this function. It would impair the interface unless it was acceptable to panic, which I suspect it wouldn't be. Otherwise we now have an error type we need to add to the interface which would have to be checked all the time. If you think it's worth it to have it be cancellable like that but less ergonomic, then we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we get a deadlock if there's no way to stop waiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that true of a number of other synchronization tools too, including go channels? It is true that a misuse of this API will absolutely result in a deadlock. Suppose we do want to make these waits cancellable, though, are we willing to panic the thread, or do we really want to check for async cancellation everywhere? I see an argument for both.
htlcswitch/link.go
Outdated
// net new HTLCs rather than forwarding them. This is the first | ||
// opportunity we have to bounce invalid HTLC adds without | ||
// doing a force-close. | ||
if l.IsFlushing() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only do this cancel back logic if the state of fwdPkg
is FwdStateLockedIn
. If the state is FwdStateProcessed
, the htlc may already have been sent to the outgoing link in a prior invocation of processRemoteAdds
and canceling back would be unsafe
htlcswitch/interfaces.go
Outdated
@@ -137,7 +137,7 @@ type ChannelUpdateHandler interface { | |||
// ShutdownIfChannelClean shuts the link down if the channel state is | |||
// clean. This can be used with dynamic commitment negotiation or coop | |||
// close negotiation which require a clean channel state. | |||
ShutdownIfChannelClean() error | |||
ShutdownHtlcManager() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
godoc
comment not updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we have the new flush call, then in situations where's a flush to shut down, can't we just add that context to an arg in Flush
. This way the caller doesn't need to reach down as much into the implementation details of how the link works. Instead if the extra arg is set, then the link starts to down itself down when it's safe to do so.
Technically, the link should still be "up" (but in the fully flushed state) until shutdown is sent and recv'd by other parties. As there's a provision that lets either side send a new update_fee
to adjust the current fee level.
peer/brontide.go
Outdated
// internal maps. We don't call WipeChannel as the channel must | ||
// still be in the activeChannels map to process coop close | ||
// messages. | ||
p.cfg.Switch.RemoveLink(cid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we need to insert the continuation signal here. In other words, in the case where we're trying to shutdown to do co-op close, the control flow (when we receive the request from our remote peer) needs to block after sending shutdown
(it shouldn't send closing_signed
until the shutdown is finished). For the RPC flow, IMO we still want to just hard exit here as default behavior. We can add a new sync flow with a new API or arg to the existing API.
334dd7e
to
7c94e0d
Compare
Replaces #8145 |
4ef3119
to
91f38ae
Compare
4e8698e
to
c7462df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to learn more about MVar
and got a bit confused - how is this different from go channel? I guess it has a Read
method - but other than that IMO go channel can do a better job? For instance, I can re-implement the methods of MVar
like this,
type GVar[A any] struct {
c chan A
}
func NewEmptyGVar[A any]() GVar[A] {
v := &GVar[A]{}
v.c = make(chan A, 1)
return GVar[A]{}
}
func NewGVar[A any](a A) GVar[A] {
v := &GVar[A]{}
v.c = make(chan A, 1)
v.c <- a
return GVar[A]{}
}
// Take will wait for a value to be put into the GVar and then immediately
// take it out.
func (g *GVar[A]) Take() A {
return <-g.c
}
// TryTake is the non-blocking version of TakeMVar, it will return an
// None() Option if it would have blocked.
func (g GVar[A]) TryTake() Option[A] {
select {
case a := <-g.c:
return Some(a)
default:
return None[A]()
}
}
// Put will wait for a value to be made empty and will immediately replace it
// with the argument.
func (g *GVar[A]) Put(a A) {
g.c <- a
}
// TryPut is the non-blocking version of Put and will return true if the MVar is
// successfully set.
func (g *GVar[A]) TryPut(a A) bool {
select {
case g.c <- a:
return true
default:
return false
}
}
I guess a more fundamental question is, why should we use this MVar
which is built for Haskell to control concurrency, while go insists we should share memory by communicating. Meanwhile I find this guy's answer interesting as Haskell also does CSP.
Aside from that, ideally I think link
s should be managed by switch
and brontide
would only need to interact with switch
. When the channel is shutting down, brontide
just calls an inteface method switch.CloseLink(linkID)
, which returns a channel that brontide
will listen to, and it will be closed when link is fully cleaned.
htlcswitch/interfaces.go
Outdated
// the channel state reaches zero htlcs. This hook will only ever be | ||
// called once. This function returns a FlushId that can be used to | ||
// preemptively remove this hook. | ||
OnFlushedOnce(func()) FlushHookID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are exposing too much implementation details of link
, instead we should seek designs that can deepen the interface methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an algebra in mind for the link? This API was one that I came up with to strike a balance between hiding implementation details and not completely redesigning LND. The link, in my interpretation is a coroutine that drives the channel state machine and manages "link state" (the state of the connection that cannot be reflected in the commitment tx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is unreasonable for us to inject lifecycle hooks into the link. There are numerous provisions in the spec that state that certain messages should only be sent at certain "blessed" junctions. I picked a couple of those out here:
- Flushed: there are no live HTLCs
- Commit(out): we just committed to a new state
- Commit(in): we just received a new commitment
There are a few others that may be interesting but not relevant here:
- Revoke(in): state n-1 is now invalid and new things are now safe that weren't before
- Revoke(out): state n-1 is now invalid and there are things that were safe before that aren't now
- Stop: the link state is consistent and will not accept any new updates, completing the lifecycle
- Start: the link is now live and capable of processing updates
There are probably more but I think this is a very clean external API for the link given what I understand about its role.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an algebra in mind for the link?
Still need to finish my TLA+ course so no algebra here. However I do think these are really nice structured states that we can build in link's lifecycle, as imo assigning a state to link
and building actions based it can really help us clean up things here.
That being said, I think the current design functions, just not sure if there's a better alternative. For instance, when I see we now need to call link.OnCommitOnce
, link.DisableAdd
and switch.RemoveLink
, the mixed API calls to link and switch make me think we should only expose it at switch
level, maybe somethig like switch.DisableLink
so we could expose less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a conversation about what subsystems "own" each other. The way I've conceptualized everything right now is that we have:
Brontide <-> ChannelLink <-> Switch <-> ChannelLink <-> Brontide
and that each of these subsystems has a "peer" relationship with one another. That said, I can imagine and can endorse a design where the Switch "owns" the ChannelLinks. This makes sense to a degree because we already rely on the Switch to be able to look up the ChannelLink from the ChanID so there are already sketches of this relationship in the codebase today.
I don't think we should remove the API I put in here since I think it accurately describes what the role of the link is, but making the Brontide responsible for coordinating the lifecycle continuations is probably neither necessary nor ideal.
Does this match what you are thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I don't think we need to remove the API either, either on interface level or implementation level. But because making the Brontide responsible for coordinating the lifecycle continuations is probably neither necessary nor ideal
, I'd say not expose link
in brontide
. Sharing some notes I've taken previously,
- When we init the shutdown:
- request shutdown via `rpcServer.CloseChannel`
- link is closed via `switch.CloseLink`, which creates a `htlcswitch.ChanClose` message that has a channel `Updates` that's used by the `chanCloser` to notify the channel is closed, then,
- `switch` calls `s.cfg.LocalChannelClose`, mounted to `brontide.HandleLocalCloseChanReqs`
- brontide calls `handleLocalCloseReq`:
- 1. brontide creates a channel closer via `p.createChanCloser` and stores it in `p.activeChanCloses`, `htlcswitch.ChanClose` message is passed to this closer.
- 2. brontide calls `chanCloser.ShutdownChan()`, in which,
- `chanCloser` calls `initChanShutdown`, in which,
- calls `c.cfg.DisableChannel`, that mounts to `ChanStatusMgr.RequestDisable`, in which,
- `ChanStatusManager` calls `processDisableRequest`, in which,
- it creates a channel update and calls `m.cfg.ApplyChannelUpdate`, mounts to `s.applyChannelUpdate` that sends a `ChannelUpdate` via the `authGossiper`
- TODO calls it broadcasted with `nil` closing tx via `MarkCoopBroadcasted`, reason being to keep the `listchannels` rpc stay consistent <- need revisit
- `chanCloser` sets its state to `closeShutdownInitiated`
- 3. `brontide` calls `link.DisableAdds` which disables the outgoing link, which forbids us sending out htlcs to the peer.
- 4. `brontide` sends the `shutdown` message to the peer
- rpcserver now listens to `htlcswitch.ChanClose.Updates` which sends signals when the channel closing process has updates.
This is just the first part where we send out the shutdown
message, notice how we jump from switch
to brontide
to link
, not to mention the following handling on peer's shutdown
message and closing_signed
messages...
htlcswitch/link.go
Outdated
// addGateState is a piece of channelLink state that tracks whether | ||
// we are allowed to add htlc's on either side of the link. | ||
type addGateState struct { | ||
outgoingAddsEnabled atomic.Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to have this gate state? so we can handle an update_add
followed by a shutdown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The link should be able to handle any sequence of instructions that will not create a protocol violation. Asking this to be the responsibility of the caller is not a great choice IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you disagree, I'd ask us to think about which subsystem is responsible for ensuring protocol adherence. It's fine if the link is supposed to be lower level, but afaict the link IS the channel state machine and as such should keep track of this state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't have an opinion when I asked, purely out of curiosity.
peer/brontide.go
Outdated
@@ -3617,6 +3617,22 @@ func (p *Brontide) StartTime() time.Time { | |||
// message is received from the remote peer. We'll use this message to advance | |||
// the chan closer state machine. | |||
func (p *Brontide) handleCloseMsg(msg *closeMsg) { | |||
link := p.fetchLinkFromKeyAndCid(msg.cid) | |||
if link == nil { | |||
// TODO(proofofkeags): is this ever a valid situation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two situations where this can happen:
- somebody sends us a message for a link that doesn't exist
- the link was removed via
RemoveLink
from an earlierclosing_signed
and we're receiving another one. This can happen when they don't agree with our proposed fee and the emptyif
statement here will prevent us from completing coop close. To fix this, we need to check if we have an activechanCloser
and then process theclosing_signed
.
Normally the latter case shouldn't happen because LND fast-completes coop close in 3 messages, but other implementations are allowed to do whatever they want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems then the appropriate thing to do here is to just ignore any link lifecycle registrations if there is no valid link for the CID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow, can you elaborate? Currently in the code as written, link can be nil here and we'd still need to process the closing_signed. I think we just need to process the closing_signed even if the link is down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently am fetching the link because later in this function I need to add link lifecycle hooks (introduced in an earlier commit in this PR). It seems like if the link has already been removed then we wouldn't need to insert those hooks. I will update the PR to reflect this understanding and maybe the code will be clearer than my explanation.
Channels do a better job up until the point where you try to read the front of the channel without changing state. In fact my initial implementation looks exactly like the implementation you wrote out.
The current version of this PR doesn't actually use MVar anymore and if I get to the end of this task and find I don't need it I'll cherry pick the commit and put it in a different draft PR and drop it from this PR. I put it in originally because an earlier implementation (which had fundamental design issues) made use of it and I chose to use it there because it was the first tool I reached for. In particular, I wanted the Read/TryRead functionality, something go channels do not provide.
Maybe, but I don't currently intend to do that in this PR. Happy to discuss this architectural change as I do believe that ownership and message flow are not ideal at the moment. I think the fact that the link is both driven by the brontide and the switch is manageable but I agree that there may be opportunities to simplify things and fix a lot of latent bugs by mediating all link access through the switch. |
cc17c90
to
ccfa32c
Compare
In order to handle shutdown requests when there are still HTLCs on the link, we have to manage the shutdown process via the link's lifecycle hooks. This means we can't use the simple `tryLinkShutdown` anymore and instead queue a `Shutdown` message at the next opportunity to do so -- when we send our next `CommitSig`
We don't need to try a link shutdown when the chan closer is fetched since, by this commit, the only callsite manages the shutdown semantics. After removing the call to tryLinkShutdown, we no longer need the function at all.
When the link is flushing in the incoming direction, it means adds are invalid. The best chance we have at dealing with this is to drop the connection. This should roll back the channel state to the last CommitSig. If the remote has already sent a CommitSig we haven't received yet, channel state will be re-synchronized with a ChannelReestablish message upon reconnection and the protocol state that caused us to flush the link will be rolled back. In the event that there was some non-deterministic behavior in the remote that caused them to violate the protocol, we have a decent shot at correcting it this way, since reconnecting will put us in the cleanest possible state to try again.
fcdc7a2
to
49b7d2f
Compare
49b7d2f
to
3439385
Compare
@coderabbitai review |
// Have alice attempt to close the channel. | ||
closeClient := alice.RPC.CloseChannel(&lnrpc.CloseChannelRequest{ | ||
ChannelPoint: chanPoint, | ||
NoWait: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 16
Configuration used: CodeRabbit UI
Commits
Files that changed from the base of the PR and between 41c167d and 343938540162e2cd2a25c18f47032580c351f0e9.Files ignored due to path filters (4)
go.mod
is excluded by:!**/*.mod
go.sum
is excluded by:!**/*.sum
lnrpc/lightning.pb.go
is excluded by:!**/*.pb.go
lnrpc/lightning.swagger.json
is excluded by:!**/*.json
Files selected for processing (20)
- docs/release-notes/release-notes-0.18.0.md (2 hunks)
- htlcswitch/interfaces.go (1 hunks)
- htlcswitch/link.go (15 hunks)
- htlcswitch/link_test.go (3 hunks)
- htlcswitch/mock.go (1 hunks)
- itest/list_on_test.go (1 hunks)
- itest/lnd_coop_close_with_htlcs_test.go (1 hunks)
- itest/lnd_mpp_test.go (1 hunks)
- itest/lnd_psbt_test.go (3 hunks)
- itest/lnd_routing_test.go (1 hunks)
- itest/lnd_wallet_import_test.go (1 hunks)
- itest/lnd_zero_conf_test.go (1 hunks)
- lnrpc/lightning.proto (2 hunks)
- lntest/harness.go (1 hunks)
- lnwallet/chancloser/chancloser.go (13 hunks)
- lnwallet/chancloser/chancloser_test.go (4 hunks)
- lnwallet/channel.go (1 hunks)
- peer/brontide.go (7 hunks)
- peer/test_utils.go (4 hunks)
- rpcserver.go (2 hunks)
Files skipped from review due to trivial changes (5)
- itest/lnd_mpp_test.go
- itest/lnd_psbt_test.go
- itest/lnd_routing_test.go
- itest/lnd_wallet_import_test.go
- itest/lnd_zero_conf_test.go
Additional comments: 51
itest/lnd_coop_close_with_htlcs_test.go (9)
- 14-19: The test case description is clear and concise.
- 25-27: The
OpenChannel
method is used correctly to set up a channel with the specified amount.- 32-44: The setup for the HODL invoice is correct, using a random preimage and its hash.
- 45-46: The
AddHoldInvoice
andSubscribeSingleInvoice
RPC calls are used appropriately to create and monitor the invoice.- 53-58: The
SendPaymentRequest
is constructed correctly with the necessary fields, including a timeout and fee limit.- 64-68: The
CloseChannel
RPC call is correctly used with theNoWait
flag set totrue
, indicating the channel should close without waiting for HTLCs to settle.- 73-78: The
SettleInvoice
call is used correctly to settle the invoice, and the subsequent receipt of the update is handled properly.- 85-96: The assertions for the
closePending
update are correct, ensuring that the update is received and the transaction ID is valid.- 98-102: The test correctly waits for the close transaction to be mined and for the channel to be fully closed.
itest/list_on_test.go (1)
- 569-572: The addition of the new test case "coop close with htlcs" to the
allTestCases
slice is correct.docs/release-notes/release-notes-0.18.0.md (2)
- 68-71: The release notes correctly document the change allowing
shutdown
s while HTLCs are in-flight, ensuring compliance with BOLT2.- 131-135: The addition of the new flag to the
CloseChannel
RPC method is documented correctly, explaining its purpose and usage.lnwallet/chancloser/chancloser_test.go (6)
- 195-195: The
CompleteCooperativeClose
method in themockChannel
struct is correctly returning a non-nilwire.MsgTx
pointer, which is necessary for the test to proceed.- 399-399: The test
TestMaxFeeBailOut
correctly simulates the scenario where the negotiated fee rate rises above the maximum fee, and the error handling is appropriate.- 531-539: The
ReceiveShutdown
andBeginNegotiation
methods are used correctly in the testTestTaprootFastClose
, reflecting the changes in the method signatures and control flow.- 544-552: The test continues to correctly simulate the cooperative close process with the updated method calls and assertions.
- 557-562: The test ensures that the
ReceiveClosingSigned
method processes the closing signed message correctly and that the channel closing transaction is non-nil.- 603-607: The final steps of the cooperative close process are correctly tested, with the appropriate assertions and error handling.
htlcswitch/interfaces.go (3)
- 137-161: The methods
EnableAdds
,DisableAdds
,IsFlushing
,OnFlushedOnce
, andOnCommitOnce
have been added to theChannelUpdateHandler
interface. Ensure that all implementations of this interface have been updated to include these new methods.Verification successful
The new methods
EnableAdds
,DisableAdds
,IsFlushing
,OnFlushedOnce
, andOnCommitOnce
have been implemented in theChannelUpdateHandler
interface as evidenced by the output from the shell scripts. All implementations of this interface have been updated to include these new methods.* 164-172: The introduction of `CommitHookID` and `FlushHookID` types is noted. Ensure that these types are used appropriately wherever hooks are managed within the `ChannelUpdateHandler`'s lifecycle. * 174-184: The `LinkDirection` type and its constants `Incoming` and `Outgoing` have been added. Verify that these constants are used consistently across the codebase and replace any magic boolean values that represent direction.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for all implementations of ChannelUpdateHandler to verify new methods are implemented. ast-grep --lang go --pattern $'type $_ struct { $$$ } func ($_ *$_) EnableAdds($_ $_) $_ { $$$ } func ($_ *$_) DisableAdds($_ $_) $_ { $$$ } func ($_ *$_) IsFlushing($_ $_) $_ { $$$ } func ($_ *$_) OnFlushedOnce($_) { $$$ } func ($_ *$_) OnCommitOnce($_, $_) { $$$ }'Length of output: 926
Script:
#!/bin/bash # Search for implementations of the EnableAdds method in ChannelUpdateHandler interface. ast-grep --lang go --pattern $'func ($_ *$_) EnableAdds($_ $_) $_ { $$$ }' # Search for implementations of the DisableAdds method in ChannelUpdateHandler interface. ast-grep --lang go --pattern $'func ($_ *$_) DisableAdds($_ $_) $_ { $$$ }' # Search for implementations of the IsFlushing method in ChannelUpdateHandler interface. ast-grep --lang go --pattern $'func ($_ *$_) IsFlushing($_ $_) $_ { $$$ }' # Search for implementations of the OnFlushedOnce method in ChannelUpdateHandler interface. ast-grep --lang go --pattern $'func ($_ *$_) OnFlushedOnce($_) { $$$ }' # Search for implementations of the OnCommitOnce method in ChannelUpdateHandler interface. ast-grep --lang go --pattern $'func ($_ *$_) OnCommitOnce($_, $_) { $$$ }'Length of output: 6422
peer/test_utils.go (5)
- 461-463: The
mockUpdateHandler
struct has been updated with new fieldsisOutgoingAddBlocked
andisIncomingAddBlocked
. Ensure that these fields are used correctly in the methodsEnableAdds
,DisableAdds
, andIsFlushing
.- 513-526: The
EnableAdds
method uses atomic operations to manage the state of adding HTLCs. Ensure that all calls to this method handle the returned error appropriately.Verification successful
The verification process has confirmed that the calls to the
EnableAdds
method within the test suite handle the returned error appropriately, using assertions to check the error's value. There are no other calls to this method outside of the test files according to the search results.* 528-541: The `DisableAdds` method also uses atomic operations and returns an error if adds are already disabled. Similar to `EnableAdds`, verify that all calls to `DisableAdds` handle the returned error correctly.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for calls to EnableAdds to verify error handling. ast-grep --lang go --pattern $'$_.$EnableAdds($_)'Length of output: 51
Script:
#!/bin/bash # Search for calls to EnableAdds to verify error handling without specifying the method receiver. rg 'EnableAdds\('Length of output: 676
Verification successful
The ripgrep search has successfully identified several calls to the
DisableAdds
method within the Go files of the codebase. The results show that this method is used in various places, includingpeer/brontide.go
,htlcswitch/mock.go
,htlcswitch/interfaces.go
, andhtlcswitch/link_test.go
, among others.To verify that the error returned by
DisableAdds
is handled correctly in each instance, we would need to examine the surrounding code for each occurrence. However, the output provided does show at least one instance inhtlcswitch/link_test.go
where the error is being checked:require.Nil(t, aliceLink.DisableAdds(dir)) require.NotNil(t, aliceLink.DisableAdds(dir))This indicates that in the test code, the error from
DisableAdds
is being handled by asserting that it isnil
ornot nil
as expected in the tests.Given this information, it can be inferred that the review comment regarding the verification of error handling for calls to
DisableAdds
is correct, and the calls to this method in the codebase do handle the returned error appropriately, at least in the test cases shown.* 543-552: The `IsFlushing` method checks the state of HTLC adds. Confirm that this method is used correctly wherever the state of HTLC adds needs to be checked.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for calls to DisableAdds to verify error handling. ast-grep --lang go --pattern $'$_.$DisableAdds($_)'Length of output: 52
Script:
#!/bin/bash # Use ripgrep to search for calls to DisableAdds to verify error handling. rg 'DisableAdds' --goLength of output: 450
Script:
#!/bin/bash # Correct the ripgrep command to search for calls to DisableAdds in Go files. rg 'DisableAdds' -g '*.go'Length of output: 1136
Verification successful
The
IsFlushing
method is used correctly in the codebase to check the state of HTLC adds, as evidenced by its usage in tests and operational logic within thehtlcswitch
package.* 554-562: The methods `OnFlushedOnce` and `OnCommitOnce` are designed to execute hooks. Since they are called immediately if conditions are met, ensure that the hooks passed to these methods are idempotent and safe to be called at any time.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for calls to IsFlushing to ensure it's used correctly. ast-grep --lang go --pattern $'$_.$IsFlushing($_)'Length of output: 51
Script:
#!/bin/bash # Search for the string "IsFlushing" to find references to the method in the codebase. rg 'IsFlushing'Length of output: 1423
lnwallet/chancloser/chancloser.go (9)
- 14-14: The addition of the
github.com/lightningnetwork/lnd/fn
import suggests the introduction of functional programming constructs, such as options, which can be beneficial for handling optional values and avoiding nil pointer dereferences.- 74-77: The introduction of the
closeAwaitingFlush
state is consistent with the PR's objective to ensure that shutdown does not proceed while HTLCs are still pending. This aligns with the BOLT 2 specification.- 207-210: The addition of
cachedClosingSigned
using thefn.Option
type is a good use of functional programming to handle optional values. This should help in managing the race condition mentioned in the comments.- 474-605: The
ReceiveShutdown
method has been refactored into a state machine pattern, which is a good practice for managing complex state transitions. However, ensure that all possible states are handled correctly and that the state transitions are valid.- 606-671: The
BeginNegotiation
method correctly transitions the state machine to the negotiation phase and handles the initiation of fee negotiation. Ensure that the logic for when to generate aClosingSigned
message is correct and that the state transitions are valid.Verification successful
The output from the latest script provides the contents of the
BeginNegotiation
method within theChanCloser
struct from thelnwallet/chancloser/chancloser.go
file. The snippet shows the method's logic for transitioning the state machine to the negotiation phase and initiating fee negotiation.Based on the provided output, the
BeginNegotiation
method performs the following actions:
- It initializes a variable
noClosingSigned
with aNone
option oflnwire.ClosingSigned
.- It uses a switch statement on
c.state
to handle different states.- In the
closeAwaitingFlush
case, it marks the channel as cooperatively closed with a nil transaction to ensure consistency with thelistchannels
RPC.- It then sets the state to
closeFeeNegotiation
.- If the channel is not the initiator, it checks for a cached remote offer to process.
- If the channel is the initiator, it proposes an initial
ClosingSigned
message.The logic for when to generate a
ClosingSigned
message is correct, and the state transitions appear to be valid based on the expected behavior described in the PR objectives and the AI-generated summary. The method transitions the state machine to the negotiation phase and handles the initiation of fee negotiation as expected.Given this information, the review comment is verified as the method does indeed correctly transition the state machine to the negotiation phase and handles the initiation of fee negotiation.
* 752-792: > 📝 **NOTE** > This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [673-878]Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the BeginNegotiation method correctly transitions the state machine and initiates fee negotiation. ast-grep --lang go --pattern $'func (c *ChanCloser) BeginNegotiation($_) { $$$ switch c.state { $$$ } }'Length of output: 115
Script:
#!/bin/bash # Adjusting the ast-grep pattern to correctly match the BeginNegotiation method in the Go codebase. ast-grep --lang go --pattern $'func (c *ChanCloser) BeginNegotiation() { $$$ switch c.state { $$$ } }'Length of output: 113
Script:
#!/bin/bash # Using ripgrep to search for the BeginNegotiation method within the ChanCloser struct. rg 'func \(c \*ChanCloser\) BeginNegotiation\('Length of output: 125
Script:
#!/bin/bash # Output the contents of the BeginNegotiation method from the chancloser.go file. rg -A 30 'func \(c \*ChanCloser\) BeginNegotiation\(' lnwallet/chancloser/chancloser.goLength of output: 1154
The
ReceiveClosingSigned
method has been significantly refactored. It is crucial to ensure that the handling of taproot channels and the fee negotiation logic are correct. The state machine pattern is used here as well, so verify that the state transitions are valid and that the fee negotiation terminates appropriately.Verification successful
The
ReceiveClosingSigned
method in thechancloser.go
file has been verified. It correctly handles taproot channels and fee negotiation logic, with appropriate state transitions and termination of the fee negotiation process. The method's implementation aligns with the expected behavior described in the review comment.* 11-17: > 📝 **NOTE** > This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [883-999]Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the ReceiveClosingSigned method handles taproot channels and fee negotiation correctly. ast-grep --lang go --pattern $'func (c *ChanCloser) ReceiveClosingSigned($_, $_) { $$$ switch c.state { $$$ } }'Length of output: 123
Script:
#!/bin/bash # Verify that the ReceiveClosingSigned method handles taproot channels and fee negotiation correctly. # Correcting the pattern by removing the unnecessary single quote at the end. ast-grep --lang go --pattern $'func (c *ChanCloser) ReceiveClosingSigned($_, $_) { $$$ switch c.state { $$$ } }'Length of output: 123
Script:
#!/bin/bash # Broaden the search criteria to find the ReceiveClosingSigned method within the ChanCloser struct. ast-grep --lang go --pattern $'func (c *ChanCloser) ReceiveClosingSigned($$$)'Length of output: 14806
The
proposeCloseSigned
method appears to handle the creation of aClosingSigned
message correctly. It is important to ensure that the logic for handling taproot channels and the saving of prior fee offers is correct.
- 11-17: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [883-999]
The helper functions for fee negotiation, such as
feeInAcceptableRange
,ratchetFee
, andcalcCompromiseFee
, are critical for the correct operation of the channel closure process. Ensure that the logic for fee negotiation is sound and that the functions work as intended.
- 11-17: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [883-999]
The
ParseUpfrontShutdownAddress
function is a utility function that should correctly parse and validate the upfront shutdown address. Ensure that the function handles all edge cases and returns appropriate errors.htlcswitch/link.go (9)
- 359-378: The addition of the
isOutgoingAddBlocked
,isIncomingAddBlocked
,flushHooks
,outgoingCommitHooks
, andincomingCommitHooks
fields to thechannelLink
struct introduces new state management for HTLCs and commit hooks. Ensure that these new states are properly synchronized across goroutines to prevent data races.- 383-432: The
hookMap
type and associated methodsnewHookMap
,alloc
, andinvoke
are added for managing hooks in the channel lifecycle. Verify that thehookMap
is used in a thread-safe manner since the warning comment indicates it is not thread-safe.- 1568-1577: The check for
IsFlushing(Outgoing)
inhandleDownstreamUpdateAdd
is a good addition for protocol compliance. However, ensure that the failure mode is correctly handled and that the peer is properly notified of the violation.- 1897-1929: The check for
IsFlushing(Incoming)
in thehandleUpstreamMsg
method when processing anUpdateAddHTLC
message is critical for maintaining protocol correctness. Ensure that the link failure and disconnection are correctly handled in this scenario.- 2180-2185: The invocation of
incomingCommitHooks
after sending a revocation is a critical point in the channel's state transition. Verify that the hooks are invoked correctly and that the channel state is consistent after this operation.- 2223-2232: The conditional logic for sending a commitment signature in response to a received
CommitSig
message is important for maintaining channel state synchronization. Verify that the conditionl.channel.OweCommitment()
is correctly determining when to send a commitment signature.- 2235-2242: The invocation of
flushHooks
after processing aRevokeAndAck
message is crucial for the channel's state management. Verify that the hooks are invoked only when the channel state is clean and that the state is correctly updated after this operation.- 2541-2546: The invocation of
outgoingCommitHooks
after sending aCommitSig
message is a critical operation. Verify that the hooks are invoked correctly and that the channel state is consistent after this operation.- 2990-2992: The error handling for
AddMessage
to the mailbox inHandleChannelUpdate
should be verified. Ensure that the error does not indicate a deeper issue that could affect the stability of the channel link.lnrpc/lightning.proto (2)
- 2049-2053: The
no_wait
field has been added to theCloseChannelRequest
message to allow for non-blocking channel closures. This change aligns with the PR objectives and user-provided facts.- 2060-2060: The
InstantUpdate
message has been added within theCloseStatusUpdate
message. However, it is currently empty and does not contain any fields or documentation. This could be intentional for future extensions, but it's worth verifying to ensure it's not an oversight.Verification successful
The
InstantUpdate
message is present in thelightning.proto
file and has been incorporated into the generated Go and Swagger JSON code artifacts. However, it remains empty without any fields or documentation, which suggests that it could be a placeholder for future use or intended to be used as a trigger without carrying additional data. The original review comment is verified as accurate, and there is no evidence of an oversight.Scripts Executed
The following scripts were executed for the analysis:
Script:
# Verify if there are any usages or planned usages for InstantUpdate message. rg --type proto "InstantUpdate"Length of output: 61
Script:
#!/bin/bash # Search for any occurrences of "InstantUpdate" in the codebase. rg "InstantUpdate"Length of output: 1278
htlcswitch/link_test.go (1)
- 7000-7025: The test
TestLinkFlushApiGateStateIdempotence
checks for idempotence by asserting that successive calls without state change result in an error. This is a good practice to catch unnecessary calls that may indicate bugs.The test logic is sound and follows good testing practices.
rpcserver.go (2)
- 2646-2649: The logic to check for active HTLCs before closing a channel has been updated to consider the
NoWait
flag. This change aligns with the PR's objective to allow channel closure without waiting for HTLCs to settle whenNoWait
is true.- 2694-2705: The addition of a conditional block to handle the
NoWait
flag during channel closure is implemented correctly. It sends an instant update if the user does not want to wait for the transaction ID, which is consistent with the PR's description.lnwallet/channel.go (2)
- 5469-5478: The implementation of
NeedCommitment
correctly acquires a read lock and defers its release, ensuring thread safety when accessing shared resources.- 5477-5477: Verify that the
false
argument passed tooweCommitment
correctly checks for outstanding remote updates and updates in the remote commit transaction that are not yet reflected in the local commit transaction, as described in the method comment.
func (f *mockChannelLink) EnableAdds(linkDirection LinkDirection) error { | ||
// TODO(proofofkeags): Implement | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EnableAdds
method is marked with a TODO comment and not implemented.
Would you like me to implement this method or should we create a GitHub issue to track this task?
func (f *mockChannelLink) DisableAdds(linkDirection LinkDirection) error { | ||
// TODO(proofofkeags): Implement | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DisableAdds
method is marked with a TODO comment and not implemented.
Would you like me to implement this method or should we create a GitHub issue to track this task?
func (f *mockChannelLink) IsFlushing(linkDirection LinkDirection) bool { | ||
// TODO(proofofkeags): Implement | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IsFlushing
method is marked with a TODO comment and returns a hardcoded false
.
Would you like me to implement this method or should we create a GitHub issue to track this task?
func (f *mockChannelLink) OnFlushedOnce(func()) { | ||
// TODO(proofofkeags): Implement | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OnFlushedOnce
method is marked with a TODO comment and not implemented.
Would you like me to implement this method or should we create a GitHub issue to track this task?
func (f *mockChannelLink) OnCommitOnce(LinkDirection, func()) { | ||
// TODO(proofofkeags): Implement | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OnCommitOnce
method is marked with a TODO comment and not implemented.
Would you like me to implement this method or should we create a GitHub issue to track this task?
link := p.fetchLinkFromKeyAndCid(chanID) | ||
if link == nil { | ||
// If the link is nil then it means it was already | ||
// removed from the switch or it never existed in the | ||
// first place. The latter case is handled at the | ||
// beginning of this function, so in the case where it | ||
// has already been removed, we can skip adding the | ||
// commit hook to queue a Shutdown message. | ||
p.log.Warnf("link not found during attempted closure: "+ | ||
"%v", chanID) | ||
return | ||
} | ||
|
||
link.OnCommitOnce(htlcswitch.Outgoing, func() { | ||
err := link.DisableAdds(htlcswitch.Outgoing) | ||
if err != nil { | ||
p.log.Warnf("outgoing link adds already "+ | ||
"disabled: %v", link.ChanID()) | ||
} | ||
|
||
p.queueMsg(shutdownMsg, nil) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to disable outgoing link adds and queue the Shutdown
message is duplicated. This should be refactored into a common method to avoid code duplication and potential inconsistencies in behavior.
- link.OnCommitOnce(htlcswitch.Outgoing, func() {
- err := link.DisableAdds(htlcswitch.Outgoing)
- if err != nil {
- p.log.Warnf("outgoing link adds already disabled: %v", link.ChanID())
- }
- p.queueMsg(shutdownMsg, nil)
- })
+ p.disableOutgoingAndQueueShutdown(link, shutdownMsg)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
link := p.fetchLinkFromKeyAndCid(chanID) | |
if link == nil { | |
// If the link is nil then it means it was already | |
// removed from the switch or it never existed in the | |
// first place. The latter case is handled at the | |
// beginning of this function, so in the case where it | |
// has already been removed, we can skip adding the | |
// commit hook to queue a Shutdown message. | |
p.log.Warnf("link not found during attempted closure: "+ | |
"%v", chanID) | |
return | |
} | |
link.OnCommitOnce(htlcswitch.Outgoing, func() { | |
err := link.DisableAdds(htlcswitch.Outgoing) | |
if err != nil { | |
p.log.Warnf("outgoing link adds already "+ | |
"disabled: %v", link.ChanID()) | |
} | |
p.queueMsg(shutdownMsg, nil) | |
}) | |
link := p.fetchLinkFromKeyAndCid(chanID) | |
if link == nil { | |
// If the link is nil then it means it was already | |
// removed from the switch or it never existed in the | |
// first place. The latter case is handled at the | |
// beginning of this function, so in the case where it | |
// has already been removed, we can skip adding the | |
// commit hook to queue a Shutdown message. | |
p.log.Warnf("link not found during attempted closure: "+ | |
"%v", chanID) | |
return | |
} | |
p.disableOutgoingAndQueueShutdown(link, shutdownMsg) |
} | ||
|
||
// If we haven't finished close negotiations, then we'll continue as we | ||
// can't yet finalize the closure. | ||
if !closeFin { | ||
if _, err := chanCloser.ClosingTx(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The panic in the default case of the switch statement is too aggressive and could cause the entire peer to crash. It would be better to log an error and continue operation.
- default:
- panic("impossible closeMsg type")
+ default:
+ p.log.Errorf("received unknown close message type: %v", typed)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if _, err := chanCloser.ClosingTx(); err != nil { | |
if _, err := chanCloser.ClosingTx(); err != nil { |
message InstantUpdate { | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The InstantUpdate
message is defined but not documented or used within the provided context. It's important to ensure that this addition is intentional and that there is a plan for its implementation.
Would you like assistance in defining the purpose and fields for InstantUpdate
, or should this be tracked in a separate issue for future implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
itests are failing, aside from that, I think it's good to go once the rpc level behavior is settled. My suggestion is to defer the NoWait
changes and itests to a new PR, and merge the rest to limit the scope. There we can then handle this case properly,
Main thing to be mindful of is that if we just block w/o any updates, users might think it's stuck, and try to issue the command again, so the new logic should reject as we already have a pending close active. Eventually we might also want to add a new state here we show for the channel re closing pending, channel flushign, etc. Again I think we can defer that to limit to the scope of this PR.
We do things this way to keep behavior consistent across REST, gRPC and CLI consistent. This was done to not alter the way we handle Recv calls from the streams.
3439385
to
69ef1b2
Compare
I think that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🦎
// active htlcs` or `link failed to shutdown` if we close the channel. | ||
// We need to investigate the order of settling the payments and | ||
// updating commitments to understand and fix . | ||
time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Change Description
This change fixes a bug where we would fail channels if our peer sent a shutdown message while there were still live HTLCs on the channel. Fixes #6039
Steps to Test
Steps for reviewers to follow to test the change.
Pull Request Checklist
Testing
Code Style and Documentation
[skip ci]
in the commit message for small changes.📝 Please see our Contribution Guidelines for further guidance.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Refactor